A high-quality, long-read genome assembly of the whitelined sphinx moth (Lepidoptera: Sphingidae: Hyles lineata) shows highly conserved melanin synthesis pathway genes

Abstract The sphinx moth genus Hyles comprises 29 described species inhabiting all continents except Antarctica. The genus diverged relatively recently (40–25 MYA), arising in the Americas and rapidly establishing a cosmopolitan distribution. The whitelined sphinx moth, Hyles lineata, represents the oldest extant lineage of this group and is one of the most widespread and abundant sphinx moths in North America. Hyles lineata exhibits the large body size and adept flight control characteristic of the sphinx moth family (Sphingidae), but it is unique in displaying extreme larval color variation and broad host plant use. These traits, in combination with its broad distribution and high relative abundance within its range, have made H. lineata a model organism for studying phenotypic plasticity, plant–herbivore interactions, physiological ecology, and flight control. Despite being one of the most well-studied sphinx moths, little data exist on genetic variation or regulation of gene expression. Here, we report a high-quality genome showing high contiguity (N50 of 14.2 Mb) and completeness (98.2% of Lepidoptera BUSCO genes), an important first characterization to facilitate such studies. We also annotate the core melanin synthesis pathway genes and confirm that they have high sequence conservation with other moths and are most similar to those of another, well-characterized sphinx moth, the tobacco hornworm (Manduca sexta).

introduced into the lab colony every 4 to 5 years. Lab populations are kept high to maintain diversity (minimum 250 adults per generation). Larvae are raised in the following conditions: 27° C, 40-50% humidity, 16-hour photoperiod, and ad libitum access to an artificial wheat germ-based diet (Davidowitz 2003). Adults are kept in a flight cage where they are provided with ad libitum access to a sponge saturated with a 20% sucrose solution and host plants (Oenothera caespitosa) for oviposition. In March of 2022 a male puparium was shipped live from Tucson, AZ to Gainesville, FL and stored in 100% ethanol at -20 ℃ for two weeks prior to DNA isolation.

DNA Isolation
Thoracic tissue was removed from the puparium and homogenized in lysis buffer and proteinase K.
The sample was then nutated for 20 hours at room temperature and incubated at 56 ℃ for one hour. Hair and scales that would otherwise clog the DNeasy mini spin column membrane were excluded by centrifuging the sample at 20,000 g for 5 minutes and retaining the top liquid portion for all downstream steps following the standard DNeasy Blood and Tissue Kit protocol. Wide bore pipet tips were used to prevent DNA shearing during transfer steps.

Sequencing
Prior to sequencing, high-integrity and highly pure genomic DNA preparations were evaluated on the Agilent TapeStation, using a genomic tape. Suitable DNA preparations typically had an ABS 260/280 ratio of 1.8-2.0, and an ABS 260/230 ratio of 2.0-2.4. DNA preparations (3-5 µg) were further cleaned using the MoBio PowerClean DNA Cleanup Kit (# 12877-50) kit. Approximately 30% percent of the DNA was lost in this additional clean-up step.
Samples were fragmented down to sizes in the 12-15 kb range using G-tubes in order to optimize the yield of Hi-Fi reads (Covaris Inc. Cat # 520079). Following the fragmentation step, DNA was concentrated using AMPure beads (1:1 bead:sample ratio) and approximately 3 µg of clean DNA was used for the subsequent SMRTbell library construction steps. The library construction steps included: DNA Damage Repair, End Repair/ A-Tailing, SMRT Bell Adaptor ligation, and ExoIII/ExoVII Nuclease treatment. The library construction procedure typically resulted in a functional SMRT bell library (~40% yield from the input amount). The final library (~1.6 µg) was size-selected in the SageELF TM instrument (Cat# ELD 7510), using 0.75% agarose gel cassettes and the 1-18 kb v2 cassette definition program. The desired SageELF fractions were cleaned using AMPure magnetic beads (0.6:1.0 beads to sample ratio) and eluted in 15 ul of 10 nM Tris HCl, pH 8.0. Library fragment size was estimated by the Agilent TapeStation (genomic DNA tapes), and these data were used for calculating molar concentrations.

Assembly and analysis
Three low-coverage contigs were identified as Firmicutes: ptg000100l, and ptg000179l, ptg000214l.
We also identified contig ptg000097l as mitochondrial DNA based on sequence similarity with other insect mitochondrial genes.  Figure S1. Genome size and heterozygosity assessed from raw read data using K-Mer counter and plotted using GenomeScope 2.0. Figure S2. Contamination check displayed using Blopblot 1.0.